Project: IMDb Dataset Analysis

Table of Contents

Introduction

This is IMDb dataset cleaned from kaggle, This data set contains information about 10,000 movies collected from The Movie Database (TMDb),including user ratings and revenue.

We're gonna investigate it, I'm going to use the adjusted values for the revenue and the budgets so we can better compare performance from different years.

First will clean up data then explore the profits by genre and actors who have been part of the most profitable films, then dive deeper about the changes over time and possible causes.

Note: The adjusted revenue filed is not complete and in this file I've used a lib to adjust for inflation and get more data points, but it turns out the mean (average) values used in the previous file was representative, but had to double check.\ Will attach the old file (haven't cleaned the markdown, but did all the visualization and analysis).

Importing Liberaries

Data Wrangling

General Properties

Adjusting ourselves using CPI https://github.com/datadesk/cpi

Keep in mind this hasn't been updated for years, just using it to keep everything to one standard

We can see a lot of empty records in fields like homepage and production companies.

A lot of these columns are needed for my usage so I'm just going to drop them.

seems like there are some empty records, lets test them.

Now that the dataste is clean, lets do some more manipulation for the cast and genre columns, splitting by "|" and making each genre in its own row so we can better analyze them. we will create two new data frames and another for both.

Will also create a dataframe with all of those fields split.

Exploratory Data Analysis

NOTE: There will be overlap and duplicates in the upcoming figures given that each movie have several genres and several cast members.

Lets find out revenue by genre.

Action, Adventure, Drama Comedy and Thriller movies bring the most revenue. And seems like action movies are the most profitable.

Lets find out revenue for movies by top 20 actor.

As found in the previous chart, the top two actors are well known action actors.

As expected, the list is dominated by actors and actresses whom most movies are in the top five genres.

Lets find top 10 movies by profit.

Action and Adventure movies dominate the top movies as expect.

Comparing Adjusted revenue to adjusted budget for top 10 movies.

We can see amazing performance by the top five movies.

It is to be said that most of these movies have been pioneers for their genres in their time, e.g:\ Avatar which had amazing VFX effects which amazed the audience.\ Star Wars and Jaws in the 70s which popularized CGI.\ The Avengers which caused superhero and comic movies become mainstream.

Time series analysis

Preparing the date

We can see the changes in the profitability margins declining over time, that is most likely caused by the having more movies in later years, which is both related to more movies are being made today and the dateset probably doesn't have all the old movies.

We can also see a HUGE dip in profitability in the 60s, that is most likely caused by Televisions becoming more affordable.

We also see big hike in the profitability in the second part of the 60s up to mid 80s, that is because of the great movies made in that period (like The God Father and The Exorcist).\ this is also partially caused by the technological advancement in CGI, movies like Star Wars, Jaws and E.T, which shook the whole industry and popularized the usage of CGI in later movies and Science Fiction movies and series.

We will confirm this in our next graphs.

We can see how the the 1960s to 1980s was a very good period for the film industry, and the great performance Avatar.

In this graph we can clearly see how popular Action and Adventure movies are.

Keeping in mind this isn't the most accurate representation of the genres, for example The Exorcist is counted as drama and Star wars as Adventure.\ The most likely reason for this is that genres are skewed toward alphabetical order.

Conclusions

Findings:\ Actions, Adventure, Drama, Comedy and Thriller are the most dominant genres.\ The late 60s- Mid 80s was the best period for the movie industry.\ More movies are getting made today.\ Technology advancement has big effect on the industry, as seen in the affordability of TVs negatively affecting it in the 60s and-\ Positively affecting it with CGI in the 70s and 80s and Avatar with the improvement of CGI and 3D technology.

Mohamed Amr.